SpMacho - Optimizing Sparse Linear Algebra Expressions with Probabilistic Density Estimation
نویسندگان
چکیده
In the age of statistical and scientific databases, there is an emerging trend of integrating analytical algorithms into database systems. Many of these algorithms are based on linear algebra with large, sparse matrices. However, linear algebra expressions often contain multiplications of more then two matrices. The execution of sparse matrix chains is nontrivial, since the runtime depends on the parenthesization and on physical properties of intermediate results. Our approach targets to overcome the burden for data scientists of selecting appropriate algorithms, matrix storage representations, and execution paths. In this paper, we present a sparse matrix chain optimizer (SpMachO) that creates an execution plan, which is composed of multiplication operators and transformations between sparse and dense matrix storage representations. We introduce a comprehensive cost model for sparse-, denseand hybrid multiplication kernels. Moreover, we propose a sparse matrix product density estimator (SpProdest) for intermediate result matrices. We evaluated SpMachO and SpProdest using real-world matrices and random matrix chains.
منابع مشابه
Robust Estimation in Linear Regression with Molticollinearity and Sparse Models
One of the factors affecting the statistical analysis of the data is the presence of outliers. The methods which are not affected by the outliers are called robust methods. Robust regression methods are robust estimation methods of regression model parameters in the presence of outliers. Besides outliers, the linear dependency of regressor variables, which is called multicollinearity...
متن کاملDensity Estimation by Total Variation Regularization
L1 penalties have proven to be an attractive regularization device for nonparametric regression, image reconstruction, and model selection. For function estimation, L1 penalties, interpreted as roughness of the candidate function measured by their total variation, are known to be capable of capturing sharp changes in the target function while still maintaining a general smoothing objective. We ...
متن کاملGeneric Programming for High Performance Numerical Linear Algebra
We present a generic programming methodology for expressing data structures and algorithms for high-performance numerical linear algebra. As with the Standard Template Library [14], our approach explicitly separates algorithms from data structures, allowing a single set of numerical routines to operate with a wide variety of matrix types, including sparse, dense, and banded. Through the use of ...
متن کاملAutomatic Generation of Sparse Tensor Kernels with Workspaces
Recent advances in compiler theory describe how to compile sparse tensor algebra. Prior work, however, does not describe how to generate efficient code that takes advantage of temporary workspaces. These are often used to hand-optimize important kernels such as sparse matrix multiplication and the matricized tensor times Khatri-Rao product. Without this capability, compilers and code generators...
متن کاملGene regulatory network inference using sparse probabilistic models
The main task of systems biology is to uncover mechanisms that regulate complex processes that take place in biological cells, especially the mechanisms of gene regulation. This project aims to identify gene regulatory interactions taking place in the early development of neural tube. Solutions proposed in this work for identification of transcription factors and their target genes are mostly b...
متن کامل